Automatic Translation Memory Fuzzy Match Post-Editing: A Step Beyond Traditional TM/MT Integration
نویسندگان
چکیده
An innovative way of integrating Translation Memory (TM) and Machine Translation (MT) processing is presented which goes beyond the traditional cascade integration of Translation Memory and Machine Translation. The new method aims to automatically post-edit TM similar matches by the use of an MT module thus enhancing the TM fuzzy (similar) scores as well as enabling the utilisation of low-score TM fuzzy matches. This leads to substantial translation cost reduction. The suggested method, which can be classified as an Example-Based Machine Translation application, is analysed and examples are provided for clarification. It is evaluated through test results that involve human interaction. The method has been implemented within the ESTeam Translator (ET) Language Toolbox and is already in use in the various commercial installations of ET. 1. Automatic Translation Memory Fuzzy Match Post-Editing According to the standard TM paradigm (Nagao, 1984), an input text unit (usually a sentence) to be translated is matched against the source language part of translation pairs stored in the TM. If an identical (full) or similar (fuzzy) match is located, then the system suggests its target language equivalent as the translation of the original text unit and lets the user accept/edit this suggestion in order to correspond accurately to the translation of the input text unit. When no full/fuzzy match can be located, the option is usually offered to invoke MT processing to translate the input text unit. The method proposed in this paper, can be classified as an Example-Based Machine Translation application (Somers, 1999), taking the TM-MT integration one step further manipulating the fuzzy match result by invoking MT (in context) in order to automatically correct the TM-based translation suggestion. We denote as Sinp-SL the input text unit, for example a sentence, consisting of words to be translated from the Source Language (SL) into the Target Language (TL). Suppose that the TM contains a text-unit pair, for example sentences again, denoted as Sref-SL and Sref-TL. The standard definition of a fuzzy match translation is that if Sinp-SL is similar to Sref-SL, through the similarity of (some of) their words, then Sref-TL is proposed as the translation of Sinp-SL (to be verified/edited by a human translator). The suggested method exploits fuzzy match information M(Sinp-SL, Sref-SL) as well as word-alignment information A(Sref-SL, Sref-TL) referring to the TM text-unit pair, in order to apply modifications on Sref-TL to correspond to the translation of Sinp-SL. The fuzzy match information M(Sinp-SL, Sref-SL) defines the links between words of Sinp-SL and Sref-SL, in other words it defines which inputSL word has matched to which reference-SL word. This type of information is standard in all TM systems since it is used in order to estimate the similarity score of a match. The word-alignment information A(Sref-SL, SrefTL), however, is anything but standard. The bottleneck of the application of Fuzzy Match Post Editing is the existence of word-alignment information (for the TM contents), which enables the appropriate correction of the TL reference text units. Word-alignment information defines the translation links between words of reference-SL and reference-TL text units (the TM pair), in other words it defines which word/phrase of the Sref-SL translates to which word/phrase of the SrefTL (and can, in general, include phrases with nonconsecutive words). This information, which is not necessarily exhaustive, can be either calculated on-line (by looking up an MT dictionary) or can be pre-stored in the TM. In the ESTeam Translator system, wordalignment information is available, through a process of automatically aligning text units at various text levels (paragraphs, sentences, subsentences) (Meyers 1998, Ahrenberg et al, 2000) by the use of (among other resources) an MT Dictionary of words and phrases. The MT Dictionary defines the relevance of two text units being compared (by defining translation links between their words) and then marks the corresponding wordalignment information to be later used for the application of Fuzzy Match Post Editing . The basic idea of the Fuzzy Match Post Editing is quite simple and it is graphically depicted in Figure 1 for the case of an example involving all supported actions: Insertion(s) of Word(s) It identifies mismatched words in Sinp-SL and based on the fuzzy match information M(Sinp-SL, Sref-SL), which provides anchor points in the vicinity of these mismatched words, it tries to identify the corresponding missing word positions in Sref-SL. It then searches in A(Sref-SL, Sref-TL) for potential available word-alignment
منابع مشابه
A fuzzier approach to machine translation evaluation: A pilot study on post-editing productivity and automated metrics in commercial settings
Machine Translation (MT) quality is typically assessed using automatic evaluation metrics such as BLEU and TER. Despite being generally used in the industry for evaluating the usefulness of Translation Memory (TM) matches based on text similarity, fuzzy match values are not as widely used for this purpose in MT evaluation. We designed an experiment to test if this fuzzy score applied to MT outp...
متن کاملLiving on the edge: productivity gain thresholds in machine translation evaluation metrics
This paper studies the minimum score at which machine translation (MT) evaluation metrics report productivity gains in a machine translation post-editing (MTPE) task. We ran an experiment involving 10 professional in-house translators from our company in which they were asked to carry out a real translation task involving MTPE, translation from scratch and fuzzymatch editing. We then analyzed t...
متن کاملThe Integration of Machine Translation and Translation Memory
We design and evaluate several models for integrating Machine Translation (MT) output into a Translation Memory (TM) environment to facilitate the adoption of MT technology in the localization industry. We begin with the integration on the segment level via translation recommendation and translation reranking. Given an input to be translated, our translation recommendation model compares the ou...
متن کاملKnowledge of Provenance and its Effects on Translation Performance in an Integrated TM/MT Environment
The integration of machine translation (MT) and translation-memory (TM) systems in professional translation settings has turned pre-translation + post-editing into an attractive alternative in terms of productivity for all parties involved in the translation process. In some cases, source files are pre-translated using a combination of customised MT and TM before reaching the translators, who t...
متن کاملCATaLog Online: A Web-based CAT Tool for Distributed Translation with Data Capture for APE and Translation Process Research
We present a free web-based CAT tool called CATaLog Online which provides a novel and userfriendly online CAT environment for post-editors/translators. The goal is to support distributed translation where teams of translators work simultaneously on different sections of the same text, reduce post-editing time and effort, improve the post-editing experience and capture data for incremental MT/AP...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004